Software Pipelining of Loops with Early Exits for the ItaniumTM Architecture
نویسندگان
چکیده
The Itanium architecture contains many features to enhance parallel execution, such as an explicitly parallel (EPIC) instruction set, large register files, predication, and support for speculation. It also contains features such as register rotation to support efficient software pipelining of loops. Softwarepipelining techniques have been shown to significantly improve the performance of loop-intensive scientific programs that have only one exit per loop. However, loops with multiple exits, which are often encountered in control-intensive non-numeric programs, pose a special challenge to such techniques. This paper describes the schema for generating software-pipelined loops using Itanium architecture features. It then presents two new methods for transforming a loop with multiple exits so that it can be efficiently software-pipelined on Itanium. They make control-flow transformations that convert a loop with multiple exits into a loop with a single exit. This is followed by ifconversion to create a loop consisting of a single basic block. These methods are better than existing techniques in that they result in better values of II (Initiation Interval) for pipelined loops and smaller code sizes. One of these methods has been implemented in the Intel optimizing compiler for the Itanium architecture. This method is compared with the technique suggested by Tirumalai et al and we show that it performs better on loops of the benchmark programs in SpecInt2000.
منابع مشابه
Spatial Software Pipelining on Distributed Architectures for Sparse Matrix Codes
Wire delays and communication time are forcing processors to become decentralized modules communicating through a fast, scalable interconnect. For scalability, every portion of the processor must be decentralized, including the memory system. Compilers that can take a sequential program as input and parallelize it (including the memory) across the new processors are necessary. Much research has...
متن کاملMaximizing Pipelined Functional Units Usage for Minimum Power Software Pipelining
This paper presents a new power-aware software pipelining method which can minimize power consumption of software pipelined loops on VLIW architecture without sacrificing performance. Our method is motivated by the following facts: (1) functional units in modern architectures are fully pipelined; (2) in a loop body, there exists instructions which are not on critical (recurrence) cycle(s). Trad...
متن کاملIteration Mapping: Loop Software Pipelining on an XIMD
The multiple instruction streams, low synchronization cost and synchronous nature of the XIMD (variable instruction stream, multiple data stream) architecture create an opportunity for a new architecture-compiler interface. As an extension to the VLIW (Very Long Instruction Word) architecture, the XIMD can exploit all VLIW scheduling techniques but these do not take full advantage of the unique...
متن کاملSoftware pipelining of nested loops for real-time DSP applications
Modem DSP Processors have been integrated with InsrrucrionLevel Purullelism(ILP), which presents a challenge to exploit ILP within DSP applications. Software Pipelining is an efficient tcchnique used to expose ILP for loop programs and has been widely used for current microprocessors. It has been recently used in DSP compilers, but only for the innermost loops. This paper proposes a new approac...
متن کاملSoftware Pipelining and Register Pressure in VLIW Architectures: Preconditionning Data Dependence Graphs is Experimentally Better than Lifetime-Sensitive Scheduling
Embedding register-pressure control in software pipelining heuristics is the dominant approach in modern back-end compilers. However, aggressive attempts at combining resource and register constraints in software pipelining have failed to scale to real-life loops, leaving weaker heuristics as the only practical solutions. We propose a decoupled approach where register pressure is controlled bef...
متن کامل